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This  paper  is  concerned  with  the  following  question  and  its 

ramifications.  The  goal  is  to  compute  some  or  all  of  the  eigenvalues 

of  a  square  matrix  B  which  is  not  symmetric.  There  is  on  hand  an  approx- 

★ 

imate  column  eigenvector  x,  an  approximate  row  eigenvector  y  ,  and  a 

number  y.  In  addition  someone  has  computed  the  nprms  of  their  residual 

...  •  '  ■■■■'  ■  r  '  0\ 

vectors,  Bril  and  lls*i,  where 

■i-.x .,xt  r  • 

r  =  Bx-xy,  s*  =  y*B-yy*  . 

It  turns  out  that  Jrl/Bxll  £  10"^  D B I  and  Bs*ll/By*B  £  10"®  IIBB.  How  good 

is  y  as  an  approximate  eigenvalue  of  B?  ^ _ 

/ 

a..*  The  experts  know  that  nothing  very  useful  can  be  said.  A  priori 
bounds  on  |x-y|  are  known  (see  Section  2),  where  X  is  some  eignenvalue 
of  B.  Moreover  these  bounds  are  best  possible,  in  the  sense  that  equality 
can  be  achieved,  but  either  they  involve  auxiliary  terms,  such  as  the 
matrix  which  transforms  B  into  its  Jordan  canonical  form,  and  consequently 
are  virtually  impossible  to  compute, or  else  they  are  hopelessly 
pessimistic  in  the  majority  of  cases. 

This  is  in  sad  contrast  to  the  symmetric  case  where  various  well 
known  error  bounds  enjoy  three  remarkable  properties: 

(a)  they  are  best  possible  inferences  from  the  information, 

(3)  they  are  computable,  , 

(y)  they  are  not  asymptotic. 

By  (y)  we  mean  that  the  error  bounds  are  not  based  on  perturbation  theory 
where  the  neglect  of  certain  complicated  terms  is  justified  only  when 
the  error  Is  "small  enough,"  No,  the  bounds  to  which  we  refer  are 
valid  however  bad  the  best  approximations  may  be. 


-2- 


When  approximations  to  several  different  eigenvectors  are  known  then, 
again  in  the  symmetric  case,  the  Rayleigh-Ritz  procedure  tells  us  how  to 
make  the  best  approximations  to  eigenvalues  and  tells  us  what  residuals 
to  compute  in  order  to  have  error  bounds  for  these  approximations. 

This  theory  is  too  useful  to  be  abandoned  and  indeed  the  Ritz 
procedure  is  readily  generalized  as  the  Galerkin  approximation.  What  is 
not  discussed  (to  our  knowledge)  is  whether  the  Galerkin  approximations 
are  optimal  in  any  useful  sense. 

It  seems  that  the  only  way  to  extend  the  optimality  properties  is 
by  making  a  radical  change  in  the  notion  of  error.  Experience  shows  that 
the  first  time  readers  are  exposed  to  this  viewpoint  they  often  feel 
that  it  dodges  real  difficulties.  It  does.  Rather  than  scale  the 
vertical  face  of  a  mountain  some  people  prefer  to  take  an  easier  indirect 
route  round  the  back  which  quite  often  leads  to  the  peak. 

In  the  context  of  the  problem  addressed  at  the  outset  the  idea  is 
as  follows. 

Ask  not  for  the  value  of  }  X-y |  but  instead  ask  for  HB-B!II  where 
B'  is  the  closest  matrix  to  B  for  which  y  is  an  eigenvalue  and 
x  and  y*  are  its  eigenvectors. 

This  idea  has  been  used  with  great  success  by  J.  H.  Wilkinson  in  a 
variety  of  matrix  problems  and  is  often  called  a  "backward"  error  analysis 
because  it  casts  the  errors  back  as  an  equivalent  change  to  the  original 
data.  If  B  is  not  known  exactly  it  sometimes  happens  that  B’  is 
indistinguishable  from  B  and  then  y  is  as  good  an  answer  as  the  data 
warrants. 

Once  this  change  has  been  made  it  is  not  very  difficult  to  see  in 

what  way  the  approximations  are  optimal  but,  to  the  best  of  our 

of  Rayleigh-Ritz 

knowledge,  the  full  extension/has  not  been  made.  A  possible  reason  for 


this  omission  is  the  fact  that  only  recently  have  there  been  serious 
efforts  to  compute  some  eigenvalues  of  large  (n>1000)  nonnormal  matrices. + 

It  was  our  desire  to  find  a  good  stopping  criterion  for  the  unsymmetric 
Lanczos  process  which  prompted  this  work.  All  expressions  must  be 
computable.  Our  goal  (with  apologies  to  Richard  Hamming)  is  numbers,  not 
insight. 

Last  but  not  least  the  error  expressions  of  §  4  can  be  combined  with 
computable  condition  numbers  to  give  (exactly)  the  first  term  in  the 
perturbation  series  for  |X-y|  in  powers  of  D B-B *  11 . 

In  §2  we  sketch  the  theory  we  want  to  generalize  and  exhibit  some 
traditional  error  bounds  for  the  nonnormal  case.  In  §3  we  collect  important 
preliminary  facts.  In  §4  comes  the  main  result,  and  in  55  the  application 
to  the  Lanczos  method.  We  follow  Householder  conventions  in  notation: 
capital  letters  for  matrices,  lower  case  roman  letters  for  column  vectors, 

Li 

and  lower  case  greek  letters  for  scalars.  However  x*  (not  x  )  denotes 
the  conjugate  transpose  of  x,  and  X . (B)  denotes  the  1th  eigenvalue  of  B 
for  any  given  ordering.  The  norms  we  use  are  defined  in  Section  3.  One 
idiosyncracy  is  to  use  syrrmetric  letters  (A,H,M,...)  for  symmetric  (or 
Hermitian)  matrices. 

2.  SOME  ERROR  BOUNDS 
Symmetric  case 

Theorem  1.  For  any  nonzero  vector  x  and  any  scalar  y  there  is  a 

an  eigenvalue  of  A  such  that 
|X-y[  <  lAx-xyl/HxB . 

^For  small  matrices  there  are  alternative  techniques  based  on  skillfully 
chosen  Gersgorin  disks,  [Wilkinson,  1965]. 
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Theorem  2.  For  any  nonzero  vector  x  and  p  e  x*Ax/x*x, 

U Ax-xpil  s  min  BAx-xyH  . 

_ _ Y _ 

Theorem  3.  (Rayleigh-Ritz) .  Given  any  nxk  Z  satisfying  Z*Z  =  Ik 
the  best  set  of  k  approximate  eigenvalues  of  A  that  can  be  derived  from 
A  is  the  spectrum  of  Z*AZ.  The  choice  H  =  Z*AZ  uniquely  minimizes,  over 
all  kxk  matrices  H  the  residual  norm 
S AZ-ZHlI F  . 

If  G  is  the  eigenvector  matrix  of  Z*AZ  then  the  columns  of  ZG  are  the 

2 

best  set  of  approximate  eigenvectors.  n C ;| p  r  trace  (C*C). 

Theorem  4.  (Error  Bounds).  There  are  k  eigenvalues  of  A,  call 
them  * . . . , ,  which  can  be  put  into  one-one  correspondence  with  the 
k  eigenvalues  ^,...,9^  of  any  symmetric  kxk  matrix  H  so  that,  given  Z, 

maxj\.-9i;  <  SI AZ-ZHil , 

k  «  » 

V  (X.-6,r  <  il AZ-ZHlI i  . 

i=l  1  1  "  F 


Discussion  of  these  classical  results,  and  proofs,  can  be  found  in 
[Parlett,  1980  Chaps.  4  and  11].  More  refined  results  than  these  are 
known  and  some  of  them  may  be  found  in  [Kato,  1966]  and  [Parlett,  1980]. 

We  list  an  important  residual  bound  which  gives  considerable  insight 
but  is  not  readily  computable. 

Theorem  5.  If  A  is  the  eigenvalues  of  A  closest  to  p(x)  (=  x*  Ax/x*x) 
and  o  is  the  separation  of  a  from  the  next  closest  eigenvalue  then 

iX-o!  <_  ( n Ax-xc il / A x II )^/o  . 

If  X  is  simple  and  z  is  its  eigenvector  then 

|sin^_(x,z)j  <_  lAx-xpl/( 5xil  5)  . 


The  General  Case 


We  begin  with  an  example  which  shows  that  the  residual  norms  of  the 
trio  (y,x,y*)  with  respect  to  B  can  be  tiny  despite  a  large  gap 
separating  y  from  the  spectrum  of  B. 


B  *  (b,,)? 


1  1  J  » 
i  <  J  • 


II Blip  «*  r\/fZ,  B's  eigenvalues  are  all  0. 

x*  =  (2n_2,2n~3, _ ,2,1,1),  llx»2  =  (4n_1+2)/3 

y*  =  (l,l,2,...,2n_3,2n"2),  II y I  =  »xl  . 

Bx-x  •  1  «(0,...,0,-l)*,  yVl-y*  =  (-1 ,0,. . .  ,0) . 

Thus  II Bx-x- 1 1 / II xll  =  ly*B-l  -y*l/ly*l  *  /T/2n_1 , 

yet  mi n  |  X-y |  / II Bit p  =  1  / II B H  _  z  t/5/n . 

X 

The  reader  is  invited  to  concoct  a  similar,  but  somewhat  more 
complicated  example  in  which  the  eigenvalues  are  evenly  spaced.  Indeed 
it  is  not  the  fact  that  B  above  has  a  Jordan  block  of  order  n  which 
causes  the  phenomenon,  rather  it  is  B's  excessive  departure  from 
normality  which  is  responsible.  (A  matrix  is  normal  if  BB*  =  B*B.) 
Sometimes,  but  not  always,  departure  from  normality  is  explained  by 
proximity  (in  matrix  space)  to  a  matrix  with  large  Jordan  blocks. 

With  one  exception  we  have  found  no  computable  error  bounds  for  the 
eigenvalues  of  nonnormal  matrices  which  explicitly  involve  residual 
norms.  Furthermore,  in  [Householder,  1972]  we  read  "To  the  best  of  my 
knowledge  no  one  has  obtained  inclusion  theorems  that  apply  to  a  matrix 
having  nonlinear  elementary  divisors."  An  inclusion  theorem  describes 
a  region  of  the  complex  plane  guaranteed  to  contain  an  eigenvalue  (but 
not  all  the  eigenvalues).  So  this  quotation  makes  our  quest  for 
computable  inclusion  theorems  look  hopeless. 
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Faced  with  this  impasse  we  give  some  results  on  spectral  variation 


and  these  point  up  the  difficulties.  Actually  residual  norms  can  be 
linked  to  changes  in  the  matrix  to  good  advantage,  as  is  done  in  Section  4. 


Theorem  6  [Ostrowski,  1957] 


n  n 


Let  C  =  B-E  with  u  =  max{ jb. . | , |c. . | },  6  =  £  £  |e..|/y.  Then 

'  J  *  w  “ 


i=l  j=l 


1.  To  each  X - (C)  there  is  a  A.(B)  such  that  | A. (C)-A.(B) |  £  (n+2)p<$1/,n 

*  J  1  J 

2.  The  eigenvalues  of  B  and  C  can  be  paired  so  that  j A- (B)-Aj (C) | 

<  2(n+l)2y61/n,  i  =  l,...,n. 


Remark.  The  nth  root  of  6  must  be  there  but  it  does  dampen  one's 
enthusiasm  for  a  priori  error  bounds,  especially  when  n  >  100. 


Theorem  7.  [Bauer  and  Fike,  1960] 

If  C  (=B-E)  is  diagonal izable  (=  semi-simple),  i.e.,  C  =  FAF-1, 
then  to  each  A.(B)  there  is  a  A -(C)  such  that 

•  J 

|  A.  (B)-A.(C)  |  <  1FEF"1  n  <  IF1  IlF’1  II  lEl, 

'  J 

using  any  matrix  norm  for  which  lAl  =  max  |A^|. 


Remark.  If  C  is  normal  then  BFl!  =  IIF”^II  =1  and  the  bound  is  as 
good  as  in  the  symmetric  case. 

Next  we  give  an  extension  of  the  Bauer-Fike  Theorem  to  the  general 


Theorem  8.  Let  C  =  B-E  and  let  J  =  FBF  be  B's  Jordan  form.  To 
any  eigenvalue  p  of  C  there  corresponds  an  eigenvalue  A  of  B  such  that 

- I^l”  ■■  <  IFEF'1! 

(HM)  * 

where  m  is  the  order  of  the  largest  Jordan  block  to  which  A  belongs.  The 
spectral  (or  the  Frobenius)  norm  must  be  used  here. 


an  appendix.  It  is  included  because,  to  the  best  of  our  knowledge,  it 
has  not  appeared  before. 


1 


Remark.  This  is  our  only  example  of  a  computable  residual  bound. 
It  also  seems  to  be  a  counter-example  to  Householder's  remark  quoted 
above.  As  A(B)  -*■  0  the  standard  residual  bound  for  normal  matrices  is 
recovered.  At  the  other  extreme  the  bound  shows  that  small  residuals 
do  not  imply  accurate  eigenvalue  approximations  for  very  abnormal 
matrices.  This  confirms  our  numerical  example. 

Two  valuable  references  for  more  recent  work  on  perturbations  are 
[Stewart,  1973]  and  [Chatelin,  1981].  The  former  concerns  bounds  on 
approximate  eigenvectors  and  invariant  subspaces,  the  latter  uses 
residual  norms, but  their  results  are  more  for  insight  than  computation 
since  they  involve  quantities  which  are  not  readily  computable. 
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3.  ALTERNATIVE  FORMULATIONS 

To  each  eigenvalue  X  of  a  complex  (or  real)  nxn  matrix  B  there 
corresponds  at  least  one  column  eigenvector  x  and  at  least  one  row 
eigenvector  y*  satisfying 

Bx  =  xX,  y*B  =  Ay*  (3.1) 

We  call  the  set  (X,x,y*)  an  eigentriple  or  eigenelement  of  B. 

Associated  with  B  is  its  conjugate  transpose  B*.  Abstractly  B*  can 
be  characterized  as  the  adjoint  of  B  on  (or  in)  Euclidean  n-space  En 
with  inner  product  (•,•).  In  other  words  B*  is  the  unique  matrix  which 
satisfies 

(u,B*v)  =  (Bu,v)  for  aTJ_  u,  v  in  En. 

The  spectrum  of  B*  is  the  conjugate  of  the  spectrum  of  B  and  we  could 
rewrite  (3.1)  as 

Bx  =  xX,  B*y  =  yX  .  (3.2) 

We  have  put  down  these  elementary  facts  because  every  author  discus¬ 
sing  thp  eigenvalue  problem  must,  for  coherence,  choose  and  stick  to 
one  of  uwo  equivalent  formulations,  illustrated  in  (3.1)  and  (3.2): 

(I)  one  operator  and  two  (dual)  spaces;  B,  En,  E*, 

(II)  two  (adjoint)  operators  and  one  space;  B,  B*,  En. 

Our  choice  is  (I);  En  is  the  Euclidean  space  of  column  vectors  and  E* 
is  the  (dual)  space  of  row  vectors.  Although  En  and  E*  are  isomorphic 
in  a  trivial  way  it  helps  to  regard  them  as  distinct  in  this  essay. 
Formulation  I  lends  itself  to  concrete  matrix  notation  and  we  shall  use 
the  familiar  forms: 
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a,b  e  En,  (a,b)  =  b*a  =  l  8. a.,  Hall  =  v/a*a  , 

i=l  1  1 

n  n 

c* »d*  e  E”,(c*,d*)  =  d*c  =  l  S.y,,  llc*ll  5  /c*c  , 

i=l 

The  reward  for  having  this  inner  product  structure  is  that  it  makes 
sense  to  compare  vectors  in  norm  or  speak  of  the  angle  between  a  and 
b  (or  between  c*  and  d*).  Consequently  adjectives  like  small  and  large 
can  be  used  legitimately  and  that  is  essential  for  work  in  matrix 
computations. 

There  is  more  than  one  matrix  norm  compatible  with  Euclidean  space. 

We  shall  confine  ourselves  to  the  two  extreme  unitarily  invariant  matrix 
norms.  The  spectral  (or  bound)  norm  is  defined  by 

IIBII  =  max  II  Bull /Hull  =/T  av(B*B)  ,  (3.3) 

u^O  max 

and  the  Frobenius  (or  Hilbert  Schmidt)  norm  is 

II  Bll  F  =  (  l  l  |b..|2)1/2  =  /trace  (B*B)  .  (3.4) 

i  j  J 

The  latter  is  easy  to  compute  but  yields  II I H  =  /n  for  the  identity  matrix 
instead  of  1.  For  any  B 

II  Bl  <  I  Bll  F  <  /rank(B)  IBS  <  /n  llBH  .  (3.5) 

In  particular,  for  any  square  rank  one  matrix  uv*, 

II  u  v  *  II F  =  II  u  v  *  I  =  lull’Hv*!  .  (3.6) 

The  matrix  norms  are  needed  so  that  we  can  speak  of  the  matrix 
C  being  close  to  B  in  the  sense  that  II C - B II / II B II  (or  IIC-Bllp/iBllp) 
is  small  compared  to  1. 
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In  the  symmetric  case  (B=A)  a  key  role  is  played  by  Rayleigh's 
quotient,  which  is  defined  for  all  nonzero  x  by 

p(x)  =  p(x;A)  e  x*Ax/x*x  . 

The  natural  extension  to  the  general  case  is  defined  for  all  pairs  x,y* 
such  that  y*x  f  0  by 

P(x,y*)  =  p(x,y*;B)  H  y*Bx/y*x  .  (3.7) 

Many  authors  call  this  the  general ized  Rayleigh  quotient  but  the  adjective 
is  superfluous  and  we  will  drop  it.  What  matters  is  that  certain  crucial 
properties  do  carry  over  from  the  nice  symmetric  case,  namely 

(1)  p  is  homogeneous  of  degree  0,  i.e.,  p(ax,By*) =  p(x,y*) 

(2)  p  is  stationary  at,  and  only  at,  the  nondegenerate  eigenvalues 
of  B. 

The  gradients  of  p  are 

Vxp(x,y*)  =  y*[B-p(x,y*)I]/y*x  , 

Vy*p(x,y*)  =  [B-p(x,y*)I]x/y*x  , 

and  the  left  hand  sides  vanish  simultaneously  only  when  (X,x,y*)  is  an 
eigenelement  of  B  and  then  p(x,y*)  =  X.  This  condition  precludes  X 
from  belonging  to  Jordan  blocks  because  the  eigenvectors  x  and  y*  for 
such  X  satisfy  y*x  =  0.  This  anomaly  tells  us  to  reformulate  the 
Rayleigh  Quotient.  If  X  belongs  to  a  single  Jordan  block  of  order  k 
there  are  (many)  nxk  matrices  X  and  Y  such  that  the  columns  of  X  and 
the  rows  of  Y*  span  X's  invariant  subspaces  in  En  and  E*  and,  moreover, 

Y*X  =  1^.  It  turns  out  that  the  kxk  matrix 


p(X,Y*)  =  Y*BX 


(3.8) 


is  similar  to  X's  Jordan  block.  In  general,  for  any  nxk  matrices  X,  Y 
satisfying  Y*X  =  1^  we  say  that  (3.8)  defines  the  Rayleigh  quotient 
of  X  and  Y*. 

In  the  definition  (3.7)  x  and  y*  are  independent  variables  and 
this  freedom  entails  that  there  is  no  upper  bound  on  |p(x,y*)|  in 
terms  of  B.  However  we  do  have 

jy*Bx|  <  lBllxlly*l  .  (3.9) 

4.  OPTIMAL  APPROXIMATIONS  FROM  ROW  AND  COLUMN  SUBSPACES 

We  are  almost  ready  to  exhibit  the  closest  matrix  B-E  to  B  for  which 
eigenelement  approximations  become  exact.  When  the  closest  matrix  is 
close  enough  then  it  can  be  regarded  as  a  perturbation  of  B.  On  the 
other  hand  we  can  just  as  well  regard  B  as  a  perturbation  of  B-E.  To 
be  specific  let  y  be  any  scalar  and  let  q  and  p*  be  any  vectors  satisfying 

P*q  =  1  .  (4.1) 

We  want 

(B-E)q  =  qy,  p*(B-E)  =  yp*  (4.2) 

so  that  y  is  a  simple  eigenvalue  of  B-E.  Standard  perturbation  theory 
[Wilkinson,  1965]  says  that  there  is  an  eigenvalue  X  of  B  such  that 

|  X — y  |  =  cond(y)tlEH  +  0(ilEl2)  (4.3) 

as  it E II  -»  0.  Moreover  the  condition  number  of  y  is  computable, 

cond(y)  e  =  ||qMp*I  .  (4.4) 

r  i 
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Cond(y)  is  the  secant  of  the  acute  angle  between  p  and  q.  It  is  also 
the  spectral  norm  of  the  spectral  projector  qp*  belonging  to  y.  Equations 
(4. 3)  and  (4.4)  provide  strong  incentive  for  knowing  11 E II . 

We  begin  with  a  special  case  of  the  main  theorem  given  below.  Define 
residual s 

r  e  Bx  -  xy  ,  s*  =  y*B  -  yy  (4.5) 

where 

y  =  y/liyS  ,  x  =  x/llxfl,  and  y*x  =  1  . 

COROLLARY  1.  The  distance  to  the  closest  matrix  B-E  for  which 
(y>x,y*)  is  an  eigentriple  is  B Ell p  and 

"E'f  *  M2  *  's*«2  - 

Note  that  r  and  s*  depend  on  y. 

This  result  shows  that  the  Rayleigh  quotient  p(x,y*)  is  not 

necessarily  the  best  approximate  eigenvalue  in  the  sense  of  minimizing 
2 

llEllp.  Since  II E 11  p  is  a  quadratic  in  y  a  little  algebra  yields  the  best 
value  y. 


-  _  p(x,x*)+p(y,y*)-  o(x,y*)/lly*ll2Hxll2  ^  ^ 

'  2  -  l/Hy*llZHx«2 

For  practical  purposes  it  is  worth  noting  that  when  both  IrH  and 
Rs*ll  are  small  then  the  best  value  of  y  is  close  to  p(q,p*)  and  it  is 
not  worth  going  to  the  trouble  of  finding  the  best  value. 

We  now  give  the  analogue  of  the  familiar  Rayleigh-Ritz  approximations 
from  a  subspace. 

Let  Q  and  P  be  any  n  by  m  matrices  (m<n)  satisfying  P*Q  *  Im-  The 
columns  of  Q  constitute  a  basis  for  the  subspace  span  Q  in  En  and  the 
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rows  of  P*  constitute  a  basis  for  the  subspace  span  P*  in  E*.  In  a 
sense  made  precise  below  the  best  set  of  m  approximate  eigenelements 
for  a  matrix  B  from  span  Q  and  span  P*  are  given  by  the  eigenelements 
of  the  m  by  m  matrix  p(Q,P*)  =  P*BQ  in  the  usual  way.  Specifically  to 
each  bi -orthonormal  pair  of  eigenvectors  u,v*  of  p(Q,P*)  satisfying 

P*8Qu  =  u0,  v*P*BQ  =  8v*,  8  =  v*P*BQu, 

there  corresponds  the  approximate  eigenelement  of  B 

(9,Qu,v*P*)  .  (4.7) 

In  the  symmetric  case  (with  P  =  Q)  the  word  optimal  is  justified 
by  the  fact  that  llBQ-Qfllp  is  minimized  by,  and  only  by,  the  choice 
r  =  P*BQ. 

In  the  general  case  the  word  optimal  is  justified  by  the  fact  that 
all  m  approximate  eigenelements  from  p(Q,P*)  are  exact  eigenelements 
of  that  matrix  closest  to  B  which  satisfies  the  three  natural  conditions 
given  in  the  main  theorem. 

MAIN  THEOREM.  B  is  an  n  by  n  complex  matrix,  Q  and  P  are  n  by  m  (nv<n) 
satisfying  P*Q  =  Im>  To  each  m  by  m  r  there  is  a  unique  closest  matrix 
B-E,  using  IMIp,  such  that 

(i)  (B-E)Q  *  QT  ,  [span  Q  invariant,  r  gives  spectrum], 

(ii)  P*(B-E)  =  TP*  ,  [span  P*  invariant,  r  gives  spectrum]. 

Only  the  choice  T  =  J  =  p(Q,P*)  ensures  that 

(iii)  P*(B-E)Q  *  J,  [preservation  of  p(Q,P*)]. 

Define  residual  matrices 

R  =  R(T)  2  (BQ-QD(Q*Q)'1/2  , 

s*  =  s*(r)  i  (p*p)'1/2(p*b-tp*)  . 

This  minimal  change  E  to  B  is  given  in  (4.15)  and 


(4.8) 


A 


It  is  convenient  to  use  orthonormal  bases  for  span  Q  and  span  P*  so 
we  define 

0  »  Q(Q*Q)"1/2,  P  =  P(P*P)'1/2  ,  (4.11) 

noting  that  the  square  root  of  a  positive  definite  matrix  M  is  uniquely 

2 

defined  as  the  symmetric  positive  definite  matrix  H  satisfying  H  =  M. 

Proof.  Define  the  auxiliary  matrix  Z  by 

P*R  =  (P*P)'1/2(J-r)(Q*Q)‘1/2  =  2Z  ,  (4.12) 

or 

S*0  =  (P*P)"1/2(J-r)(Q*Q)"1/2  =  2Z  .  (4.13) 

Conditions  (i)  and  (ii)  can  be  rewritten  as 

EQ  =  R,  P*E  =  S*  .  (4.14) 

Now  use  (4.12),  (4.13)  to  verify  that  one  solution  of  (4.14)  is 

E  =  (R-PZ)0*+  P(S*-ZQ*)  .  (4.15) 

By  linearity  of  the  equations  (4.14)  all  other  solutions  are  E+G  with 
G  satisfying 

GO  *  0  ,  P*G  *  0  ,  (4.16) 

We  abbreviate  trace  by  tr  and  observe  that  for  any  nxn  G 
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tr[(E+G)*(E+G)]  =  tr(E*E)  +  2tr(E*G)  +  tr(G*G)  . 
Equations  (4.15)  and  (4.16)  imply  that 
E*G  =  0(R-PX)*G  +  (S-QZ*)P*  , 

*  0R*G  -  0  +  0  , 

Now  use  tr[KL]  *  tr[LK]  and  (4.16)  to  find 

tr[E*G]  »  tr[0R*G]  =  tr[R*G0]  =  tr[0]  =  0  . 

Thus  from  (4.17) 

IE+Gllr  =  IEI?  +  ilGlf  >  «E»e 
F  F  F  —  F 

and  so  B-E  is  the  closest  matrix  satisfying  (i)  and  (i 
It  remains  to  compute  ®EHp  using  (4.12),  (4.13), 

E*E  *  0(R*R-Z*P*R-R*PZ+Z  *Z)Q* 

+  SS*  -  $Z*P*  -  PZQ*  +  QZ*ZQ*  , 

=  0(R*R-4Z*Z)0*  +  SS*  . 

Using  tr(KL)  *  tr(LK)  yields 

tr(E*E)  =  tr(R*R+S*S-4Z*Z)  . 

Consider  now  condition  (iii).  It  forces 

P*EQ  *  0  . 

Use  (4.15)  and  (4.12),  (4.13)  to  see  that 
P*EQ  =  2Z  *  0  . 

In  other  words,  r  *  J.  In  this  case  E  s  RO*  +  PS*> 


(4.17) 


(4.18) 


(4.19) 

)  and  using  fl •  Dp  . 
(4.15); 


(4.20) 


(4.21) 
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and 


E*E  =  QR*§*  +  $S* 

=  OR*RQ*  +  5(S*S)5*  , 

■  (0,5)  diag(R*R,S*S)(§,S)*  (4.22) 

By  (4.13)  the  matrix  (0,5)  is  orthonormal  and  the  2m  positive  eigenvalues 
of  E*E  are  those  of  R*R  and  S*S.  This  yields  (4.10).  a 

The  factor  4Z*Z  in  (4.20)  shows  that,  in  general,  the  choice 
r  =  P*BQ  does  not  lead  to  the  minimum  lElp  such  that  span  Q  and  span  P* 
are  invariant.  Nor  should  it  when  P*  and  Q  are  chosen  perversely.  The 
simplest  way  to  see  what  happens  is  to  consider  the  inadmissable  case 
when  span  Q  and  span  P*  are  each  invariant  but  associated  with  disjoint 
parts  of  B's  spectrum.  Since  P*Q  *  0  the  Rayleigh  quotient  is  not 
defined  and  yet  a  good  choice  for  r  is  the  set  of  eigenvalues  associated 
with  either  Q*BQ  or  P*BP.  Then  the  approximate  eigenvalues  are  exact 
and  either  R  or  S*  is  the  zero  matrix. 

When  P*Q  =  Im  then  the  matrix  QP*  is  the  spectral  projector  of  B-E 
associated  with  p(Q,P*).  Specifically  PQ*  is  one  term  in  a  partition 
of  unity 


>n  ■  I  0,1-; 

with  the  special  property 

B-E  *  J  Q^P*,  Pi  =  P*(B-E)Q^  . 

★ 

This  decomposition  is  useful  if  the  H Q7- P^B p-  are  not  too  large;  moreover 
lQP*flF<  iQl plP*J F  . 
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Whenever  HQlIpilPlI -  is  very  large  it  should  be  taken  as  a  hint  that  more 
columns  should  be  added  to  P  and  Q. 


5.  Application  to  the  Lanczos  Algorithm 

Simple  subspaces  from  which  approximations  are  sought  to  eigen- 
elements  of  an  nxn  matrix  B  are  Krylov  subspaces.  Given  a  pair  of 
starting  vectors  and  pf  satisfying  p|q1  =  1  the  Lanczos  algorithm,  by 
step  j,  produces  two  bi-orthonormal  nxj  matrices  Qj  and  Pj  which  satisfy 
in  exact  arithmetic, 

P*Q .  =  I.  ,  (5.1) 

Jw  J  J 

BVQjJj  *  (0,0,..,0,q.+13j+1)  ,  (5.2) 

pjB-jjpr(  )  (5-3) 


and  0.  is  a  j  by  j  tridiagonal  matrix 

J 


al  y2  O 


Theoretically  Q,  and  can  be  thought  of  as  the  result  of  bi -orthonormal 
«J  J 

ing,  via  Gram- Schmidt,  the  two  Krylov  sequences 
(<?!>  Bq^  B2qi,...,Bj"1q1)  , 

{ppP-|B,  p-|B  , . . ,  p^8J  }  . 

The  algorithm  comes  to  a  halt  as  soon  as  3^  =  0,  for  some  i,  but  we 
will  assume  that  such  good  luck  has  not  occurred  by  step  j. 


Section  4  showed  that  there  Is  good  reason  to  seek  approximate 

★ 

eigentriples  from  the  projection  of  B  along  span  Qi  and  span  P.,  namely 

J  J 

Jj  ■  P>j  (5-5) 

Let  us  examine  a  typical  eigenvalue  6;  say 


J,z  =  z  9,  w*Ji  *  9w*,  w*z  =  1  . 

J  J 


(5.6) 


We  suppress  the  dependence  of  9,  z,  and  w*  on  j .  Now  multiply  (5.2)  and 
(5.3)  by  z  and  w*.  and  introduce  two  important  quantities  c.  and  u.,  the 

J  J 

last  elements  of  z  and  w*,  to  find  that 


B(QjZ)  -  (Qjl)e  •  . 

(5.7) 

(w*P*)B  -  9(w*Pj)  •  Vj+,P5+,  . 

(5.8) 

The  approximate  eigenvectors  are  x  =  Q.z,y*  *  w*P^  and,  by  applying  the 

J  J 

main  theorem  with  m  =  1  to  (5.7)  and  (5.8)  we  obtain  a  computable 
expression  for  the  error. 


COROLLARY.  The  closest  matrix  to  B  with  (9,x,y*)  as  an  eigenelementis 
B-E  and 


I  EH 


max|- 


3  C,- 1  iqj+-|  i 


^xT 


(5.9) 


From  (4.15)  E  is  the  rank  two  matrix 


E  »  (^^-)qj+1x*  +  (--^-  gJ)yp 
IxP  J  1  lyl2 

The  object  of  Interest,  I X-Q | , 


J+l  • 

is  unknown  but  when 


1 E B  is  small 


enough  then  (4.3)  applies, 
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I  X-e  |  =  cond  ( 0 )  H  ESI  +  0  ( !1  E  !l  ” ) 


(5.10) 


and,  by  (4.4) , 

cond(9)  =  llxll  11  y  *  11  .  (5.11) 

Consequently  the  Lanczos  algorithm  should  be  continued  until  both  IIEU 

is  small  and  cond(9)  - II E 11  is  below  the  given  threshold  for  accuracy. 

For  j  <<  n  (say  j  =  10  viT)  the  computation  of  eigenvalues  of  the 

tridiagonal  J.  is  modest  compared  with  the  cost  of  a  Lanczos  step.  In 

principle  x  and  y*  are  computable  but  this  would  require  the  use  of 

the  matrices  and  P^. ;  that  cost  is  2jn  operations  and  probably  exceeds 

the  cost  of  a  Lanczos  step.  However  it  is  advisable  to  keep  in  the 

fast  store  the  quantities  llQ.II?  =  f  !lq.  H2  and  f  !lp*il2,  which  are 

J  F  i=l  1  J  F  1-1  1 

easy  to  update.  A  computable  bound  on  cond(6)  is  then 

cond(3)  =  3xillly*!l  <_  H Q  . D  i! p*U  B z II  II w* II  (5.12) 

A  way  to  use  the  error  estimates  in  practice  is  described  at  the  end 
of  the  section.  The  error  (5.9)  continues  to  hold  in  practice,  apart  from 
roundoff  terms,  because  no  use  was  made  of  (5.1)  which  fails  completely 
in  finite  precision. 

Persistance  of  Ritz  Values 

The  eigenvalues  of  the  tridiagonal  are,  in  general,  distinct 
from  those  of  (In  the  symmetric  case  the  two  sets  interlace  each 
other.)  Nevertheless  for  large  enough  k  some  of  these  Ritz  values  change 
by  negligible  amounts  when  k  increases.  Which  ones?  Certainly  among 
them  are  the  Ritz  values  which  have  already  stabilized  at  eigenvalues 
of  3.  This  fact  is  a  straightforward  extension  of  the  symmetric  version 
by  C.  C.  Paige.  The  key  observation  is  that  the  corresponding  eigenvectors 

do  not  change  much  as  k  increases. 
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We  shall  need  some  extra  notation.  For  any  vector  v  let  v  denote 
the  vector  (g)  where  0  elements  are  appended  to  v  to  make  v  have  the 
appropriate  dimension  for  the  context  in  which  it  is  used. 


COROLLARY  OF  MAIN  THEOREM. 

Let  (0,z,w*)  be  an  eigentriple  of  the  tridiagonal  matrix  J.  of 

J 

(5.4)  with  w*z  =  1.  Then,  for  all  k  >  j,  (0,z,w*)  is  an  eigentriple  of 
Jk-Gk  and 


Gi,  ■  (~ +  (-+1^-)we* 


1 2  /cj+l 


I  w*ll 


2  '  j+1 


Moreover 


II G.  II  =  max 


(-ItlllJ.  ,  ilitpi)  .  JG  , 

l  ||Z||  ’  u w*||  J  *  k  F 


,  IVjl 


lzl‘ 


Hw*n 


*u 2 


and  both  values  are  independent  of  k. 


Proof.  The  residual  vectors  for  z  with  respect  to  Jk  is  readily  verified 
to  be 


where 

“  =  0j+l^ejz)  =  8j+l^j  * 

and  II zll  *  llzll.  Similarly  for  s*.  The  corollary  results  from  using  these 
expressions  for  the  residuals  in  the  main  theorem  in  Section  4.  » 

The  application  is  immediate  and  remarkable.  When  k.l  and  |wJ 

J  J 

have  dwindled  sufficiently  so  that  cond(0)  * G j 8  <  tol  then  3  is  a 
stabilization  (or  condensation)  point  for  the  Lanczos  process.  There  is 


-21- 


a  matrix  within  to!  of  Jk  which  has  0  in  its  spectrum  for  all  k  >  j. 


Use  of  Error  Estimates 

At  the  jth  step  of  the  Lanczos  process  there  is  freedom  in  the 
choice  of  8j+^  and  Yj+1  although  the  quantities  Sj+1Yj+^»  !  ®j+l  I  ^j+i  **  > 
lYj+T^Pj+l 11  are  aH  determined.  In  the  corollary  given  above  we  may 
pretend  that  6  and  y  are  scaled  so  that 


II  G^Il  =  max{ 


“Till — 


_  r|6j+lYj+l?fj 
1 .  Hzf  'lw*il — 


1/2 

} 


(5.13) 


With  respect  to  J.  the  condition  number  of  0  is 

J 

cond(0 ; J. )  =  I zO  llw*»  .  (5.14) 

J 

Moreover,  for  all  k  >  j, 


cond(9;Jk-Gk)  =  #zll  Hw*H 


cond(0; Jj) 


Here  is  an  economical  way  to  test  for  termination  when  seeking 
an  eigenvalue  to  within  to  1  * II  B 11 . 

1.  Advance  the  Lanczos  process,  computing  appropriate  eigenvalues  of 
J  at  each  step,  until  the  one  (or  ones)  of  interest  stabilizes  to  the 
required  accuracy.  Call  it  9  and  let  the  step  number  be  j. 

2.  If  cond(9;J.)  is  too  big  then  abandon  the  assumption  that  9  approximates 

J 

a  simple  eigenvalue  of  B.  Find  a  cluster  of  eigenvalues  of  and 

"  J 

associated  eigenvector  matrices  Z  =  (z^.Zg,..)  and  W  =  (W^W^,...)  so 
that  W*Z  =  I  and  II Z 11 F  11 W* II ^  is  acceptable.  Then  use  the  more  general 
estimates  of  the  main  theorem  with  Q.Z  in  place  of  Q  and  P^W  in  place 

J  J 

of  P,  W*J.Z  in  place  of  J. 

J 
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3.  If  cond(9;J.)  is  acceptable  (?)  then  test  the  statement 

!  3j+1Yj+1CjUJj  !  Ilzill,w*i!  £  (tol  i!  Jj11  p)  (5.20) 

The  left  side  is  cond(3 , J^-G^)  IlG^il,  for  k  >  j. 

4.  If  (5.20)  fails  then  continue  the  Lanczos  process,  other,' vise  test 
cond(9  ;B-E)  •  3  E!l  using  (5.9),  (5.11),  (5.12),  namely  test 


max{ 


3J>1 


"j'f 


ilw” 


i  y  i 


j+n 


!|pj*+i; 


ilo. 


!!zi! } 


<  tol  il J  II  (5.21) 

J  * 

5.  If  (5.21)  fails  then  continue  the  Lanczos  process,  otherwise  compute 
x  *  QjZ,  y*  =  w*??,  and  deliver  max{ i 3j+14j i  ly*l ,  lYj+iujl 

llpt+1ll  !l x!l }  as  the  error  estimate  for  3. 

Conments 

(i)  We  do  not  know  what  numerical  values  should  be  used  to 
discriminate  between  simple  eigenvalues  and  perturbations  of  multiple 
ones.  Please  consult  [Golub  and  Wilkinson,  1976],  [A.  Ruhe,  1970], 

[Varah,  1970]. 

(ii)  The  costly  formation  of  x  and  y*  is  not  made  until  3  is 
acceptable. 

(iii)  These  tests  may  be  used  in  finite  precision.  The  failure  of 
PjQj  =  Ij  means  that  Jj  is  not  the  optimal  projection  of  B.  Nevertheless 
it  is  still  a  good  approximation. 
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APPENDIX 


Proof  of  Theorem  8.  B  =  FJF~^  ,  C  =  B-E. 

We  begin  by  immitating  the  proof  of  the  Bauer-Fike  theorem.  By 
definition  of  y,  B-E-yl  is  singular.  If  y  is  an  eigenvalue  of  B  the 
bound  holds  trivially.  So  we  consider  the  contrary  case  when  the  matrix 

D  =  J-yl 

is  invertible.  We  know  that 
F(D-F'1EF)F'1 

is  singular.  It  follows  that  II D  V’^EFll  >_  1  for  any  norm  in  which  Hill  =  1. 
Thus 

1  /II D”1  II  <  II  F-1EFll  <  II F"1  II II  Fll  II  Ell  . 

It  remains  to  estimate  II D” 1 II .  We  focus  on  the  spectral  norm  and  recall 

that  D  is  a  direct  sum  of  Jordan  blocks  and  so  1/IID~^II  is  the  smallest 

singular  value  among  all  the  blocks.  For  a  typical  block  the  smallest 

2 

singular  value  a  satisfies  a  =  A^(T)  =  Xm^n(T)  for  a  tridiagonal  matrix 


Here  6  =  |A-yj,  for  some  eigenvalue  A  of  B  and  T  is  diagonally  similar 
to  LL*  where  L  is  a  typical  block  of  D.  T  is  positive 
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definite,  but  only  just  so.  An  asymptotically  correct  lower  bound  on 
its  smallest  eigenvalue  can  be  obtained  quite  simply  from  the  observation 

that  det  T  =  | det  LL  |  =  |det  L[^  =  5^m.  By  Gersgorin's  Theorem  all  T's 

P  2 

eigenvalues  satisfy  X.  <  1  +  6  +  26  =  (1+6)  .  Thus 

xi  =  Xr-'V(V"\J  ’ 

=  det  T/(X2...Xm)  , 

>  52m/ ( 1 +6 ) 2m“2  . 

It  follows  that  1/IID~^II  is  greater  than  the  smallest  of  the  expressions 
6m/(l+5)m"^  over  all  the  blocks.  So  for  someAand  its  associated  largest 
block  size  m, 

- 1a~h1  -t  <  —L-  <  IIFEF"1H  .  c 

(1  +  |  X-y[ )  “  II D”  II  ~ 
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